Method and system for estimating a number of users of a website based on lossy compressed data

ABSTRACT

The invention relates to a method and system for estimating a number of users of a website. According to the method, it is repeatedly determined that the website is accessed by an entity and data dependent on the entity is determined and stored. The stored data is repeatedly compressed using a lossy compressing algorithm and based on the compressed data a number of users of the website is estimated.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.12/746,005, filed 21 Sep. 2010, and claims priority therefrom under 35U.S.C. §120. application Ser. No. 12/746,005 is the U.S. National Stageentry under 35 U.S.C. §371 of international applicationPCT/EP2008/010027, filed 26 Nov. 2008, which claims priority to Europeanpatent application 07023415.8, filed 4 Dec. 2007. All the listedapplications are incorporated herein by reference in their entirety.

TECHNICAL FIELD

The present invention relates to a method and system for estimating anumber of users of a website.

BACKGROUND

The number of users of a website is an important parameter to determinethe success of the webpage and often has a direct influence onadvertising revenues.

From the prior art, methods are known to identify entities that access awebpage. In general, either the IP address of the accessing entity isused or the entity is identified using a cookie. Furthermore, somewebpages require that the user logs in by inputting his login name and apassword. In this case, the accessing entity can be easily identifiedvia the login.

In this connection it is important to understand that it is nearlyimpossible to be sure about the user who sends a request having an IPaddress, who uses a computer on which a cookie is stored, or who logs inat a website. The IP address may be rewritten by a firewall, thecomputer may be used by more than one person, or somebody may use thelogin of somebody else. Nevertheless, the identified entities are oftenused to estimate the number of users of a website because they areconsidered to be a reasonable proxy for the users.

Once the entity is identified, records are stored that log the access ofthe entity. At the end of an interval, an analysis tool reads the storedrecords and counts the number of accessing entities which is deemed tobe a good estimation for the number of accessing users.

One problem with this procedure is that till the end of the reportinginterval, all the records need to be stored. For small websites, thisdrawback is acceptable. However, for large websites, the amount ofstored data becomes excessive.

To cope with this problem, in the prior art sampling has been proposed.The basic principle of sampling is that only for each x-th (e.g. each10th) access an identity of an accessing entity is stored. At the end ofthe reporting interval, the stored records are analysed and a number ofaccessing entities is estimated based on the sampled records.

However, this sampling method has the drawback that it is only precisewhen the number of events, e.g. logged accesses, in the reportinginterval is large. However, while long intervals like entire years willtypically contain enough events, the same might not be true for hourlyor daily intervals.

Furthermore, if the sampling percentage is fixed before the data issampled, there is the danger that the sample percentage is chosen eithertoo low, leading to inaccuracies, or too high, thereby degradingperformance.

Therefore, it is an object of the invention to provide a method and asystem for estimating a number of users of a website that are efficientand that allow to estimate the number of users with an acceptableprecision.

This object is accomplished by the subject-matter of the independentclaims. Preferable embodiments are specified by the dependent claims.

SUMMARY OF THE INVENTION

The invention comprises a method for estimating a number of users of awebsite that comprises the steps of repeatedly determining that thewebsite is accessed by an entity, in reaction to the access of thewebsite, determining data dependent on the entity that accesses thewebsite and storing said data, lossy compressing said stored data, andestimating a number of users of the website based on the compressedstored data.

This means that according to the invention, e.g. data about the identityof accessing entities is stored and said stored data is later oncompressed using a lossy compression algorithm. Based on the lossycompressed data, it is possible to estimate a number of users of thewebsite.

This method has the advantage that due to the lossy compressionalgorithm, less data needs to be stored and analysed. A small loss ofprecision is acceptable, because the determination of data dependent onthe entity is already a source of inaccuracies.

In one preferred embodiment, the method according to the inventioncomprises the steps of determining a maximum amount of data to bestored, determining a set of numbers, repeatedly mapping the datadependent on the entities that access the website onto a number of theset of numbers according to a mapping function and storing the resultingnumbers. According to this embodiment, when the maximum amount of datato be stored is exceeded, the set of numbers is pruned and the storednumbers that correspond to the pruned numbers of the set of numbers aredeleted. Later on, a relation between the size of the set of thenot-pruned numbers and the size of the set of the pruned numbers isdetermined, and the number of users of the website is estimated based onthe pruned stored data and said relation.

In this way, it is possible to reduce the amount of stored data and tonevertheless estimate the number of users of the website.

In another preferred embodiment, the method comprises the steps ofproviding an array of bits in a storage, repeatedly mapping the datadependent on the entities that access the website onto one bit of thearray of bits according to a mapping function and activating the bit ofthe array of bits onto which the data was mapped, and estimating thenumber of users of the website based on the activated bits of the arrayof bits.

This embodiment has the advantage that it allows to efficiently storedata that can be used to estimate a number of users of the website.Preferably, only single bits are set in response to accesses. In thisway, the required amount of storage is often lower compared to storingthe exact identities of accessing entities as numbers.

Preferably, the method comprises the steps of determining a period oftime for which the number of users is to be estimated, dividing theperiod of time into intervals, providing a plurality of arrays of bitscorresponding to the number of intervals, associating each array of bitsof the plurality of arrays of bits with one interval, such that for eachinterval one array of bits is used. In each interval, the bits of theassociated array of bits are activated according to the data dependenton the entities that access the website during the interval. In thisway, each array of bits allows to estimate the number of users of thewebsite that accessed the website during the associated interval. Afterthe end of the period of time, an array of bits is determined thatallows to estimate the number of users of the website that accessed thewebsite during the period of time. This array of bits may be determinedby combining the arrays of bits for each interval using a logicalOR-operation between the bits of the arrays of bits having the samesignificance.

This method allows to efficiently store data about entities that haveaccessed the webpage during each interval and to efficiently combine thestored data for each interval by performing a simple logicalOR-operation.

In one embodiment, the mapping function maps the data dependent on theentities that access the website onto a number of the set of numbers oronto one bit of the array of bits based on a uniform distribution. In analternative embodiment, a non-linear mapping function is employed, suchthat e.g. the number of data items dependent on the entities that ismapped onto one bit varies in dependence on a magnitude of the dataitem. This makes the method more scalable in comparison to the usage ofa uniform distribution.

In one embodiment of the non-uniform distribution, the mapping functionmay be composed of two steps: First, data dependent on the accessingentity, e.g. a user identification, is mapped to an integer h using ahash function. Hash functions are described in the prior art that ensurethat the resulting integers are evenly distributed between 0 (included)and a constant M (excluded). Afterwards, a non-linear function may beapplied to the resulting hash value.

In a preferred embodiment, during the step of estimating the number ofusers of the website based on the activated bits of the array of bits amaximum likelihood analysis is performed to determine the number ofaccessing entities that is most likely to reproduce the given pattern ofactivated bits of the array.

In a preferred embodiment, the step of estimating the numbers of usersof the website based on the activated bits of the array of bits maycomprise the steps of providing an estimation function relating a numberof activated bits with a number of users, determining a number ofactivated bits by counting the number of activated bits of the array ofbits, and estimating the number of users of the website based on thenumber of activated bits of the array of bits and the estimationfunction. In this embodiment, the position of an activated bit isirrelevant, such that the estimation can be efficiently performed.

Preferably, as the estimated number of users an integer for which theexpected number of activated bits matches the observed number ofactivated bits most closely is chosen. Because this computation isexpensive to perform exactly, a preferred approach is to precompute theexpected number of activated bits for certain numbers of users and touse this data for interpolating the actual number of users based on theobserved number of activated bits.

Preferably, the data dependent on the entity is an identification of theentity.

The method according to the invention may be efficient and may allow toestimate the number of users with an acceptable precision for allinterval lengths with a uniform algorithm and potentially without theneed to estimate sampling parameters precisely.

Furthermore, the invention comprises a system for estimating a number ofusers of a website. The system may comprise means for determining thatthe website is accessed by an entity, means for determining datadependent on the entity that accesses the website, in reaction to theaccess of the website, means for storing said data, means for lossycompressing said stored data, and means for estimating a number of usersof the website based on the compressed stored data.

The system according to the invention may have the same advantages asthe corresponding method according to the invention.

In one embodiment, the system comprises means for determining a maximumamount of data to be stored, means for determining a set of numbers,means for repeatedly mapping the data dependent on the entities thataccess the website onto a number of the set of numbers according to amapping function, means for storing the resulting numbers, means forpruning the set of numbers and deleting the stored numbers thatcorrespond to the pruned numbers of the set of numbers, when the maximumamount of data to be stored is exceeded, means for determining arelation between the size of the set of the not-pruned numbers and thesize of the set of the pruned numbers, and means for estimating thenumber of users of the website based on the pruned stored data and saidrelation.

In an alternative embodiment, the system according to the inventioncomprises a storage storing an array of bits, means for repeatedlymapping the data dependent on the entities that access the website ontoone bit of the array of bits according to a mapping function and foractivating the bit of the array of bits onto which the data was mapped,and means for estimating the number of users of the website based on theactivated bits of the array of bits.

Preferably, the system comprises means for determining a period of timefor which the number of users is to be estimated, means for dividing theperiod of time into intervals, means for providing a plurality of arraysof bits corresponding to the number of intervals, means for associatingeach array of bits of the plurality of arrays of bits with one interval,such that for each interval one array of bits is used, means foractivating in each interval the bits of the associated array of bitsaccording to the data dependent on the entities that access the websiteduring the interval, such that each array of bits allows to estimate thenumber of users of the website that accessed the website during theassociated interval, and means for, after the end of the period of time,determining an array of bits that allows to estimate the number of usersof the website that accessed the website during the period of time bycombining the arrays of bits for each interval using a logicalOR-operation between the bits of the arrays of bits having the samesignificance.

In one embodiment, the means for repeatedly mapping comprises a mappingfunction that maps the data dependent on the entity onto a number of theset of numbers or onto one bit of the array of bits based on a uniformdistribution. In an alternative embodiment, the means for repeatedlymapping comprises a mapping function that is non-linear, such that thenumber of data items dependent on the entities that is mapped onto onebit varies in dependence on a magnitude of the data item.

In the case of the non-linear distribution, the mapping function may becomposed of two steps: First, data dependent on the accessing entity,e.g. a user identification, is mapped to an integer h using a hashfunction. Hash functions are described in the prior art that ensure thatthe resulting integers are evenly distributed between 0 (included) and aconstant M (excluded). Afterwards, a non-linear function may be appliedto the resulting hash value.

Preferably, the means for estimating the number of users of the websitebased on the activated bits of the array of bits uses a maximumlikelihood analysis to determine the number of accessing entities thatis most likely to reproduce the given pattern of activated bits of thearray.

In a preferred embodiment, the means for estimating the number of usersof the website based on the activated bits of the array of bitscomprises an estimation means comprising an estimation function relatinga number of activated bits with a number of users, means for determininga number of activated bits by counting the number of activated bits ofthe array of bits, and means for estimating the number of users of thewebsite based on the number of activated bits of the array of bits andthe estimation function.

Preferably, the means for estimating the number of users of the websitechooses as the estimated number of users an integer for which theexpected number of activated bits matches the observed number ofactivated bits most closely. Because this computation is expensive toperform exactly, a preferred approach is to pre-compute the expectednumber of activated bits for certain numbers of users and to use thisdata for interpolating the actual number of users based on the observednumber of activated bits.

Preferably, the data dependent on the entity is an identification of theentity.

The method and the system according to the invention may be implementedusing software. Accordingly, the invention comprises a computer programproduct that comprises a computer readable medium and a computer programrecorded therein in form of a series of state elements corresponding toinstructions which are adapted to be processed by a data processingmeans of a data processing apparatus such that a method according to theinvention is performed or a system according to the invention is formedon the data processing means.

BRIEF DESCRIPTION OF THE DRAWINGS

Preferred embodiments and further details of the present invention willbe explained in the following with reference to the figures.

FIG. 1 shows one embodiment of a system for estimating a number of usersof a website.

FIG. 2 shows a process that generates access data according to oneembodiment of the present invention.

FIG. 3 shows one embodiment of a data compression and user estimationprocess according to the invention.

FIG. 4 shows another embodiment of a data compression and userestimation process according to the invention.

FIG. 5 shows an embodiment of the method according to the invention.

FIG. 6 shows an embodiment of the system according to the presentinvention.

DETAILED DESCRIPTION OF THE ILLUSTRATED EMBODIMENTS

FIG. 1 shows one embodiment of a system according to the presentinvention. A web server 101 receives accesses of the website 102 andgenerates based on the accesses an access log 103. The access log isanalysed by a web analytics engine 104. The web analytics engine 104comprises a log parser 105 that parses the access log 103 and generatesaccess data that is stored in a data base 106. The access data 106 isregularly compressed using a lossy compression algorithm. A means forlossy compressing the data 107 generates a data basis 108 for theestimation of a number of users of the website. Based on this databasis, a number of users of the website is estimated 109.

FIG. 2 shows a process that generates access data 106 that may be usedin one embodiment of the method according to the present invention. Instep 201, it is determined that the website is accessed by an entity. Instep 202, an identification of the entity that accesses the website isdetermined. In step 203, data dependent on the identification of theentity is stored and the process resumes with step 201.

FIG. 3 shows one embodiment of a lossy compression procedure incombination with a user estimation step according to the presentinvention. In step 301, the procedure checks whether the end of aninterval for which the accessing entities are logged is reached. If theend of the interval is reached, the stored data is lossy compressed instep 302. Step 303 checks whether the end of the period of time isreached. In other words, it is checked whether another interval follows.If this is the case, the process resumes with step 301. If the end ofthe period of time is reached, a number of users is estimated in step304 based on the stored data.

FIG. 4 shows a data compression and user estimation procedure accordingto one embodiment of the method according to the invention. It isassumed that a process is running that continuously generates accessdata like e.g. the process shown in FIG. 2. In step 401, it is checkedwhether the maximum amount of stored data is exceeded. If this is thecase, the set of numbers is pruned in step 402 and the stored numbersthat correspond to the pruned numbers of the set of numbers are deletedin step 403. Afterwards, it is checked whether the end of the period oftime is reached in step 404. If the end of the period of time is notreached, the procedure resumes with step 401. If the end of the periodof time is reached, a relation is determined. As shown in step 405, therelation may be for example the size of the set of the pruned numbersplus the size of the set of the not-pruned numbers, i.e. the overallsize of the set of numbers at the beginning, divided by the size of theset of the not-pruned numbers. Based on this relation and the storeddata, a number of users of the website is estimated in step 406.

The following example further illustrates this procedure. It is assumedthat the maximum amount of data to be stored is set to be 5, while theset of numbers on which the identifications of the entities are mappedare the numbers between 0 and 99. It is assumed that in a firstinterval, data is stored that indicates that the entities 1, 12, 17, 19,33, 43, 67, 82, 83, 92 accessed the webpage in the interval 1. Since theamount of stored data exceeds 5, the lower half of the set of numbers ispruned. The numbers 67, 82, 83, and 92 remain.

In the interval 2, the entities 1, 3, 72, and 73 access the webpage. Thenumbers 72 and 73 are stored, such that the entities 67, 72, 73, 82, 83,and 92 are stored. This again exceeds the maximum amount of data to bestored such that the third quarter of the set of numbers is pruned, i.e.the numbers from 50 to 74. The numbers 82, 83, and 92 remain. These arethree entities. It is known that only one quarter of the complete set ofnumbers remains. Therefore, the numbers of accessing entities ismultiplied with 4 to estimate the number of users that accessed thewebpage. 3 times 4 yields 12. The exact number of accessing entities was14. This means that the described procedure provides an acceptableestimation for the actual number of users of the website, while usingonly a fraction of the storage that would be necessary to store theidentities of all accessing entities.

In this example, the maximum amount of data was chosen to be 5 toprovide a concise example. In practice, values of 10000 or more willlead to better precision without increasing the memory demand unduly.

According to a further embodiment, it is possible to aggregate accessdata on multiple scales at the same time. For example, it may benecessary to prepare daily, weekly, and yearly statistics. In that case,the amounts of pruning applied on the different scales may vary.Especially the shorter intervals will normally be pruned less. Ifexisting data from shorter intervals is combined into a larger interval,the combined data set must be pruned at least as strongly as the mostpruned interval.

FIG. 5 shows another embodiment of the method according to the presentinvention. In step 501, it is determined that the website is accessed byan entity. In step 502, an identification of the entity that accessesthe website is determined. The identification of the entity is thenmapped onto one bit of a first array of bits according to a mappingfunction in step 503. The bit of the array of bits onto which theidentification was mapped is activated accordingly (step 504).

In a preferred embodiment, the following mapping function H(x) is used:

${H(x)} = {{- K}\; \ln \; \frac{x + C}{M + C}}$

x denotes the hashcode derived from an accessing entity. It ischaracterized by a number taken from an interval from 0 to M−1, e.g. bya 64-bit-value if M=2⁶⁴. For each interval, an array of bits having Sbits is provided. The larger S is chosen, the higher is the precision ofthe generated statistics. Reasonable values for S lie around 1 million.The mapping function H(x) maps each entity onto a value between 0 andS−1. The mapping function possesses the property that some values arechosen more often than others, i.e. the mapping function is non-linear.C and K are constants that can be derived from the expected maximalvalue E of the statistical parameter to be examined and the values of Sand M:

$C = \frac{M}{E - 1}$ $K = \frac{S}{\ln \; E}$

With this mapping function for each entity a hash bucket H(x) iscalculated and the corresponding bit is set in the array of bits. Forcombining the arrays of bits of the intervals, the arrays of bits can besimply combined using a logical OR-operation.

In another preferable embodiment of the mapping function, C for themapping function H(x) is calculated as:

$C = \frac{S}{E \cdot {\ln \left( \frac{1}{C} \right)}}$

For determining C, a fixed point must be calculated that can beestimated for example with

$C = \frac{S}{E \cdot {\ln \left( \frac{E}{S} \right)}}$

Then, K is set as follows:

$K = \frac{S}{\ln \left( \frac{M + C}{C} \right)}$

In this embodiment, the parameter x must be uniformly distributed.Techniques for calculating x from an identification of an entity areknown, for example a cryptographic hash function can be used, but alsosimpler hash functions will normally give satisfactory results.

The described mapping function exhibits in a wide range ascale-invariant quality of the estimation, if E is chosen sufficientlyhigh. Even if E is orders of magnitude too high, the resulting precisionwill degrade only moderately.

In step 505 shown in FIG. 5, it is determined whether the end of thefirst interval is reached. If this is not the case, the procedureresumes with step 501. Otherwise, the procedure proceeds to step 506, inwhich it is determined again that the website is accessed by an entity.In step 507, an identification of the accessing entity is determined,and in step 508, this identification of the entity is mapped onto onebit of a second array of bits according to the above described mappingfunction. The corresponding bit of the second array of bits is activatedaccordingly in step 509. In step 510, it is determined whether the endof the second interval is reached. If the end is not reached yet, theprocedure resumes with step 506. Otherwise, the procedure proceeds tostep 511, in which the first array of bits and the second array of bitsare combined using a logical OR-operation, wherein the result is writteninto the first array of bits. This means that after step 511, the firstarray of bits indicates the identifications of the entities that haveaccessed the website during the first and the second interval.

Afterwards, it is checked whether the end of the period of time has beenreached in step 512. If there are further intervals, the procedureproceeds to step 513, where the second array of bits is deleted. Then,the procedure jumps back to step 506. If the end of the period of timeis reached in step 512, the procedure proceeds to step 514 in which thenumber of the users of the website is estimated based on the first arrayof bits.

FIG. 6 shows one embodiment of a system according to the invention. Thesystem for estimating a number of users of a website 601 comprises meansfor determining that the website is accessed by an entity 602, means fordetermining an identification of the entity that accesses the web site603, means for storing data dependent on the identification of theentity 604, means for lossy compressing the stored data 605, and meansfor estimating a number of users of the website based on the stored data606.

The specifications and drawings are to be regarded in an illustrativerather than a restrictive sense. It is evident that variousmodifications and changes may be made thereto, without departing fromthe scope of the invention as set forth in the claims. It is possible tocombine the features described in the embodiments in a modified way forproviding additional embodiments that are optimized for a certain usagescenario. As far as such modifications are readily apparent for a personskilled in the art, these modifications shall be regarded as implicitlydisclosed by the above described embodiments.

1. A method for estimating a number of users of a website, the methodcomprising: repeatedly determining that the website is accessed by anentity; in reaction to the access of the website, determining datadependent on the entity that accesses the website and storing the data;lossy compressing the stored data; and estimating a number of users ofthe website based on the lossy compressed stored data.
 2. The method ofclaim 1, further comprising: determining a maximum amount of data to bestored; determining a set of numbers; repeatedly mapping the datadependent on the entities that access the website onto a number of theset of numbers according to a mapping function and storing the resultingnumbers; when the maximum amount of data to be stored is exceeded,pruning the set of numbers and deleting the stored numbers thatcorrespond to the pruned numbers of the set of numbers; determining arelation between the size of the set of the not-pruned numbers and thesize of the set of the pruned numbers; and estimating the number ofusers of the website based on the pruned stored data and the relation.3. The method of claim 1, further comprising: providing an array of bitsin a storage; repeatedly mapping the data dependent on the entities thataccess the website onto one bit of the array of bits according to amapping function and activating the bit of the array of bits onto whichthe data was mapped; and estimating the number of users of the websitebased on the activated bits of the array of bits.
 4. The method of claim3, further comprising: determining a period of time for which the numberof users is to be estimated; dividing the period of time into intervals;providing a plurality of arrays of bits corresponding to the number ofintervals; associating each array of bits of the plurality of arrays ofbits with one interval, such that for each interval one array of bits isused; activating in each interval the bits of the associated array ofbits according to the data dependent on the entities that access thewebsite during the interval, such that each array of bits allows toestimate the number of users of the website that accessed the websiteduring the associated interval; and after the end of the period of time,determining an array of bits that allows to estimate the number of usersof the website that accessed the website during the period of time bycombining the arrays of bits for each interval using a logicalOR-operation between the bits of the arrays of bits having the samesignificance.
 5. The method of claim 3 wherein the mapping function mapsthe data dependent on the entity based on a uniform distribution.
 6. Themethod of claim 3 wherein the mapping function is non-linear, such thatthe number of data items dependent on the entities that is mapped ontoone bit varies in dependence on a magnitude of the data item.
 7. Themethod of claim 3 wherein estimating the number of users of the websitebased on the activated bits of the array of bits uses a maximumlikelihood analysis.
 8. The method of claim 3 wherein estimating thenumber of users of the website based on the activated bits of the arrayof bits comprises: providing an estimation function relating a number ofactivated bits with a number of users, determining a number of activatedbits by counting the number of activated bits of the array of bits, andestimating the number of users of the website based on the number ofactivated bits of the array of bits and the estimation function.
 9. Themethod of claim 1 wherein the data dependent on the entity is anidentification of the entity.
 10. A system for estimating a number ofusers of a website, the system comprising a data processing apparatusadapted to execute: determining that the website is accessed by anentity; determining data dependent on the entity that accesses thewebsite, in reaction to the access of the website; storing the data;lossy compressing the stored data; and estimating a number of users ofthe website based on the lossy compressed stored data.
 11. The system ofclaim 10 wherein the data processing apparatus is adapted to execute:determining a maximum amount of data to be stored; determining a set ofnumbers; repeatedly mapping the data dependent on the entities thataccess the website onto a number of the set of numbers according to amapping function; storing the resulting numbers; pruning the set ofnumbers and deleting the stored numbers that correspond to the prunednumbers of the set of numbers, when the maximum amount of data to bestored is exceeded; determining a relation between the size of the setof the not-pruned numbers and the size of the set of the pruned numbers;and estimating the number of users of the website based on the prunedstored data and the relation.
 12. The system of claim 10 wherein thedata processing apparatus comprises a storage configured to store anarray of bits, the data processing apparatus being adapted to execute:repeatedly mapping the data dependent on the entities that access thewebsite onto one bit of the array of bits according to a mappingfunction and for activating the bit of the array of bits onto which thedata was mapped, and estimating the number of users of the website basedon the activated bits of the array of bits.
 13. The system of claim 12wherein the data processing apparatus is adapted to execute: determininga period of time for which the number of users is to be estimated,dividing the period of time into intervals, providing a plurality ofarrays of bits corresponding to the number of intervals, associatingeach array of bits of the plurality of arrays of bits with one interval,such that for each interval one array of bits is used, activating ineach interval the bits of the associated array of bits according to thedata dependent on the entities that access the website during theinterval, such that each array of bits allows to estimate the number ofusers of the website that accessed the website during the associatedinterval, and after the end of the period of time, determining an arrayof bits that allows to estimate the number of users of the website thataccessed the website during the period of time by combining the arraysof bits for each interval using a logical OR-operation between the bitsof the arrays of bits having the same significance.
 14. The system ofclaim 12 wherein the mapping function maps the data dependent on theentity based on a uniform distribution.
 15. The system of claim 12wherein the mapping function is non-linear, such that the number of dataitems dependent on the entities that is mapped onto one bit varies independence on a magnitude of the data item.
 16. The system of claim 12wherein, when estimating the number of users of the website based on theactivated bits of the array of bits, the data processing apparatus isadapted to use a maximum likelihood analysis.
 17. The system of claim 12wherein, when estimating the number of users of the website based on theactivated bits of the array of bits, the data processing apparatus isadapted to execute: storing an estimation function relating a number ofactivated bits with a number of users, determining a number of activatedbits by counting the number of activated bits of the array of bits, andestimating the number of users of the website based on the number ofactivated bits of the array of bits and the estimation function.
 18. Thesystem of claim 10 wherein the data dependent on the entity is anidentification of the entity.
 19. A non-transitory computer readablemedium having a computer program recorded therein, the computer programcomprising a set of instructions which, when executed by a dataprocessing apparatus, cause the data processing apparatus to execute:repeatedly determining that the website is accessed by an entity; inreaction to the access of the website, determining data dependent on theentity that accesses the website and storing the data; lossy compressingthe stored data; and estimating a number of users of the website basedon the lossy compressed stored data.
 20. A method for estimating anumber of users of a website, the method comprising: determining aperiod of time for which the number of users is to be estimated;dividing the period of time into intervals; providing a plurality ofarrays of bits corresponding to the number of intervals; associatingeach array of bits of the plurality of arrays of bits with one interval,such that for each interval one array of bits is used; repeatedlydetermining that the website is accessed by an entity; in reaction tothe access of the website, determining data dependent on the entity thataccesses the website and storing the data; in each interval, repeatedlymapping the data dependent on the entity that accesses the websiteduring the interval onto one bit of the array of bits that is associatedwith the interval according to a mapping function and activating the bitof the array of bits onto which the data was mapped, such that after theend of the period of time each array of bits allows to estimate thenumber of users of the website that accessed the website during theassociated interval; after the end of the period of time, determining anarray of bits that allows to estimate the number of users of the websitethat accessed the website during the period of time by combining thearrays of bits for each interval using a logical OR-operation betweenthe bits of the arrays of bits having the same significance; andestimating the number of users of the website based on the activatedbits of the array of bits.
 21. A system for estimating a number of usersof a website, the system comprising a data processing apparatus adaptedto execute: determining a period of time for which the number of usersis to be estimated; dividing the period of time into intervals;providing a plurality of arrays of bits corresponding to the number ofintervals; associating each array of bits of the plurality of arrays ofbits with one interval, such that for each interval one array of bits isused; repeatedly determining that the website is accessed by an entity;in reaction to the access of the website, determining data dependent onthe entity that accesses the website and storing the data; in eachinterval, repeatedly mapping the data dependent on the entity thataccesses the website during the interval onto one bit of the array ofbits that is associated with the interval according to a mappingfunction and activating the bit of the array of bits onto which the datawas mapped, such that after the end of the period of time each array ofbits allows to estimate the number of users of the website that accessedthe website during the associated interval; after the end of the periodof time, determining an array of bits that allows to estimate the numberof users of the website that accessed the website during the period oftime by combining the arrays of bits for each interval using a logicalOR-operation between the bits of the arrays of bits having the samesignificance; and estimating the number of users of the website based onthe activated bits of the array of bits.
 22. A non-transitory computerreadable medium having a computer program recorded therein, the computerprogram comprising a set of instructions which, when executed by a dataprocessing apparatus, cause the data processing apparatus to execute:determining a period of time for which the number of users is to beestimated; dividing the period of time into intervals; providing aplurality of arrays of bits corresponding to the number of intervals;associating each array of bits of the plurality of arrays of bits withone interval, such that for each interval one array of bits is used;repeatedly determining that the website is accessed by an entity; inreaction to the access of the website, determining data dependent on theentity that accesses the website and storing the data; in each interval,repeatedly mapping the data dependent on the entity that accesses thewebsite during the interval onto one bit of the array of bits that isassociated with the interval according to a mapping function andactivating the bit of the array of bits onto which the data was mapped,such that after the end of the period of time each array of bits allowsto estimate the number of users of the website that accessed the websiteduring the associated interval; after the end of the period of time,determining an array of bits that allows to estimate the number of usersof the website that accessed the website during the period of time bycombining the arrays of bits for each interval using a logicalOR-operation between the bits of the arrays of bits having the samesignificance; and estimating the number of users of the website based onthe activated bits of the array of bits.